Skip to content

Conversation

@wanderingai
Copy link

@wanderingai wanderingai commented Nov 22, 2023

This PR allows for monarch_cuda kernel to be built for computes sm80, sm89, and sm90, which includes the following GPUs:

  • A100
  • H100
  • L40
  • RTX 6000 Ada

Additionally the setup.py is updated to enable builds based on nvcc availability but without direct torch.cuda availability for flexible builds.

Update:

The compiler flags have been abstracted to support both PTX and SASS builds while defaulting to the original Ampere-based PTX-only build i.e. -gencode=arch=compute_80,code=compute_80.

Successfully tested by building a docker image and running tests under tests/:


image
image

@DanFu09
Copy link
Contributor

DanFu09 commented Nov 22, 2023

It looks like this PR is introducing some race conditions - when I install using this branch, some tests fail:

pytest -s -q tests/test_flashfftconv.py
Running 1120 items in this shard
......................................................................................................................................................................................................................
F.....................................................................................................................................................................................................................
................................................................F.F...F...............................................................................................................................................
......................................................................................................................................................................................................................
......................................................................................................................................................................................................................
..................................................

@DanFu09 DanFu09 self-requested a review November 22, 2023 04:18
@michaelfeil
Copy link

@wanderingai Love this PR, its a improvement from the previous setup.py. Can this be merged, worst case with no sm_90 flags by default?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants